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Abstract:  Entropy-based  image  thresholding  has  received  considerable  interest  in  recent  years. 
Two  types  of  entropy  are  generally  used  as  thresholding  criteria:  Shannon’s  entropy  and  relative 
entropy,  also  known  as  Kullback-Leibler  information  distance,  where  the  former  measures 
uncertainty  in  an  information  source  with  an  optimal  threshold  obtained  by  maximising 
Shannon’s  entropy,  whereas  the  latter  measures  the  information  discrepancy  between  two  different 
sources  with  an  optimal  threshold  obtained  by  minimising  relative  entropy.  Many  thresholding 
methods  have  been  developed  for  both  criteria  and  reported  in  the  literature.  These  two  entropy- 
based  thresholding  criteria  have  been  investigated  and  the  relationship  among  entropy  and  relative 
entropy  thresholding  methods  has  been  explored.  In  particular,  a  survey  and  comparative  analysis  is 
conducted  among  several  widely  used  methods  that  include  Pun  and  Kapur’s  maximum  entropy, 
Kittler  and  Illingworth’s  minimum  error  thresholding,  Pal  and  Pal’s  entropy  thresholding  and 
Chang  et  al.'s  relative  entropy  thresholding  methods.  In  order  to  objectively  assess  these 
methods,  two  measures,  uniformity  and  shape,  are  used  for  performance  evaluation. 


1  Introduction 

Thresholding  is  an  important  technique  in  image  segmenta¬ 
tion,  enhancement  and  object  detection.  Many  methods 
have  been  reported  in  the  literature  [1-5].  Of  particular 
interest  is  an  information  theoretic  approach  that  is  based 
on  the  concept  of  entropy  introduced  by  Shannon  in  infor¬ 
mation  theory  [6].  The  principle  of  entropy  is  to  use  uncer¬ 
tainty  as  a  measure  to  describe  the  information  contained  in 
a  source.  The  maximum  information  is  achieved  when  no 
a  priori  knowledge  is  available,  in  which  case,  it  results  in 
maximum  uncertainty.  For  instance,  if  an  experiment  is 
conducted  in  an  unknown  environment  that  cannot  be  esti¬ 
mated  a  priori,  a  reasonable  approach  is  to  assume  that  all 
outcomes  of  the  experiment  are  equally  likely  to  avoid 
introduction  of  any  possible  biased  knowledge.  Under  this 
situation,  the  ME  is  achieved  by  the  maximum  uncertainty. 
This  is  intuitively  appealing  from  an  information  theory 
point  of  view.  In  other  words,  if  one  has  no  preference 
among  samples  resulting  from  an  experiment,  the  best 
decision  is  not  to  introduce  any  biased  knowledge  into  the 
decision  process.  Instead,  all  samples  must  be  treated 
equally  important.  In  this  case,  the  probability  distribution 
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that  describes  the  experiment  is  either  uniformly  distributed 
in  continuous  probability  space  or  equally  likely  in  discrete 
probability  space,  both  of  which  yield  the  ME. 

Using  ME  as  an  optimal  criterion  for  image  thresholding 
was  first  proposed  by  Pun  [7,  8].  It  was  later  corrected  and 
improved  by  Kapur  et  al  [9].  The  concept  was  further  gen¬ 
eralised  to  Renyi’s  entropy  [10].  Basically,  the  entropy- 
based  thresholding  considers  an  image  histogram  as  a 
probability  distribution,  and  then  selects  as  an  optimal 
threshold  value  that  yields  the  ME.  More  specifically,  a 
best  entropy-thresholded  image  is  the  one  that  preserves 
as  much  information  as  possible  that  is  contained  in  the 
original  unthresholded  image  in  terms  of  Shannon’s 
entropy.  Although  such  entropy  thresholding  seems  promis¬ 
ing,  it  also  suffers  from  one  drawback.  It  does  not  take  into 
account  the  image  spatial  correlation.  Therefore  different 
images  with  an  identical  histogram  will  result  in  the  same 
threshold  value.  In  order  to  mitigate  this  problem,  two 
approaches  were  proposed  in  the  past.  Both  extended  a  one¬ 
dimensional  (1-D)  image  histogram  to  two-dimensional 
(2-D)  image  histograms,  both  of  which  had  taken  care  of 
inter-pixel  spatial  correlation  in  different  ways.  One 
approach  was  first  proposed  by  Abutaleb  [11]  who  used 
the  original  1-D  histogram  and  its  local  average  to  form  a 
2-D  histogram  from  which  a  pair  of  optimal  threshold 
values  can  be  derived.  Several  extensions  to  Abutaleb ’s 
approach  have  been  investigated  [12-16].  Another 
approach  considers  the  grey-level  co-occurrence  matrix  as 
a  means  to  capture  transitions  between  grey  levels  [17]. 
Unlike  Abutaleb ’s  approach  that  makes  use  of  two  separate 
threshold  values,  the  co-occurrence  matrix-based  approach 
requires  only  one  single  threshold  value.  It  is  known  that 
the  co-occurrence  matrices  are  often  used  in  texture 
analysis.  Haralick  et  al  [18]  proposed  14  co-occurrence 
matrix-based  texture  measures  to  extract  information  for 
texture  analysis.  On  the  basis  of  the  concept  of  the 
co-occurrence  matrix  Pal  and  Pal  [19]  recently  developed 
two  entropy-based  thresholding  techniques,  called  local 
entropy  (LE)  and  joint  entropy  (JE).  They  can  be  viewed  as 
an  extension  of  Pun  and  Kapur  et  al.'  s  ME  approach  where 
the  LE  and  the  JE  maximise  entropies  of  local  quadrants 
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and  joint  quadrants  resulting  from  thresholding  the 
co-occurrence  matrix,  respectively.  So,  if  we  consider  Pun 
and  Kapur  et  al. 9 s  approach  as  a  first-order  entropy  threshold¬ 
ing  method,  Abutaleb  ’  s  method  and  Pal  and  Pal  ’  s  method  can 
be  thought  of  as  second-order  entropy  thresholding  methods. 

The  entropy-based  thresholding  methods  discussed  earlier 
are  derived  from  maximisation  of  Shannon’s  entropy. 
Relative  entropy,  also  known  as  Kullback-Leibler  infor¬ 
mation  distance,  direct  divergence  or  cross  entropy,  has 
been  also  proposed  as  an  alternative  thresholding  criterion. 
Two  early  approaches  were  minimum  error  thresholding 
(MET)  developed  by  Kittler  and  Illingworth  [20]  and 
minimum  cross  entropy  (MCE)  developed  by  Li  and  Lee 
[21].  The  underlying  assumption  of  Kittler  and 
Illingworth’s  method  is  that  the  image  to  be  thresholded 
can  be  modelled  by  a  mixture  of  two  Gaussian  distributions 
with  appropriate  weights,  in  which  the  two  Gaussian  distri¬ 
butions  are  used  to  describe  the  image  background  and  fore¬ 
ground,  respectively,  and  the  weights  are  determined  by  the 
threshold.  The  desired  optimal  threshold  that  produces  a  two- 
member  Gaussian  mixture  best  matches  the  original  1-D 
image  histogram  where  the  relative  entropy  is  used  as  such 
a  matching  measure.  Minimising  relative  entropy  is  equival¬ 
ent  to  finding  a  two-member  Gaussian  mixture  which  has  the 
minimal  discrepancy  between  the  histogram  of  thresholded 
image  and  the  original  histogram.  This  concept  was  further 
generalised  by  Pal  and  Pal  [22],  in  which  the  relative 
entropy  and  Gaussian  mixture  model  were  replaced  by  the 
divergence  and  a  Poisson  model,  respectively.  In  contrast, 
Li  and  Lee’s  approach  considered  a  constrained  thresholding 
problem  with  cross  entropy  used  as  an  optimal  criterion.  It 
minimised  the  cross  entropy  subject  to  two  constraints  that 
the  means  of  foreground  and  background  must  remain 
unchanged  before  and  after  thresholding.  Unfortunately,  it 
was  shown  that  the  MCE  used  in  Li  and  Lee’s  method  was 
not  actually  cross  entropy  [23]. 

More  recently,  Chang  et  al.  [24]  developed  an  alternative 
relative  entropy  thresholding  method  that  also  used  the  rela¬ 
tive  entropy  as  a  threshold  criterion.  Instead  of  using  the 
image  histogram  as  the  way  considered  in  Kittler  and 
Illingworth’s  MET  and  Li  and  Lee’s  MCE,  their  approach 
used  the  co-occurrence  matrix  and  minimised  the  discre¬ 
pancy  of  grey-level  transitions  in  the  co-occurrence  matrix 
before  and  after  an  image  was  thresholded.  Conceptually, 
what  Pal  and  Pal’s  approach  was  to  Pun  and  Kapur 
et  aV  s  entropy  thresholding  method  is  exactly  what 
Chang  et  al.9 s  relative  entropic  thresholding  method  was 
to  Kittler  and  Illingworth’s  MET  and  Li  and  Lee’s  MCE. 
In  other  words,  Kittler  and  Illingworth’s  MET,  Li  and 
Lee’s  MCE  and  Pun  and  Kapur  et  al.9 s  method  can  be  con¬ 
sidered  as  first-order  entropy-based  thresholding  methods, 
as  they  only  deal  with  the  1-D  image  histogram  as 
opposed  to  Pal  and  Pal’s  and  Chang  et  al.9 s  methods 
which  can  be  considered  as  second-order  entropy-based 
methods  due  to  the  use  of  the  2-D  co-occurrence  matrix. 
The  crucial  difference  between  entropy  thresholding  and 
relative  entropy  thresholding  is  that  the  former  maximises 
Shannon’s  entropy,  whereas  the  latter  minimises  relative 
entropy.  Chang  et  al.9 s  approach  was  further  improved  in 
the  work  of  Lee  et  al.  [25]  and  was  also  extended  to 
Ali-Silvey  distance  measures  in  the  work  of  Ramac  and 
Varshney  [26].  In  analogy  with  the  idea  that  Pal  and  Pal 
extended  Pun’s  ME  approach  to  local  entropy  and  joint 
entropy  methods,  Lee  et  al.9 s  also  extended  Chang  et  al.9 s 
relative  entropy  approach  to  local  relative  entropy  (LRE) 
and  joint  relative  entropy  (JRE)  methods.  Interestingly, 
their  derived  LRE  and  JRE  were  not  actually  relative 
entropy,  a  similar  error  that  was  made  in  Li  and  Lee’s 
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MCE  [23].  Nonetheless,  like  Li  and  Lee’s  MCE,  the  LRE 
and  JRE  proposed  by  Lee  et  al.  [27]  were  also  demonstrated 
to  be  reasonable  good  criteria. 

In  this  paper,  we  investigate  the  entropy-based  and  rela¬ 
tive  entropy-based  thresholding  criteria  and  explore 
relationship  among  entropy  and  relative  entropy  threshold¬ 
ing  methods.  In  particular,  we  conduct  a  comparative  study 
and  analysis  between  entropy-based  and  relative  entropy- 
based  thresholding  methods.  Three  new  thresholding 
methods,  global  entropy  (GE),  LRE  and  JRE  are  also  intro¬ 
duced,  where  the  LRE  and  JRE  are  correct  versions  of  the 
LRE  and  JRE  proposed  by  Lee  et  al.  [27].  Interestingly, 
Chang  et  al.9 s  [24]  method  can  be  reinterpreted  in  this 
paper  as  global  relative  entropy  (GRE),  which  complements 
the  LRE  and  JRE.  With  these  interpretations,  the  three  rela¬ 
tive  entropy  thresholding  methods,  GRE,  LRE  and  JRE  can 
be  considered  as  counterparts  of  GE,  LE  and  JE  in  entropy 
thresholding  methods.  As  many  popular  first-order  thresh¬ 
olding  methods  have  been  surveyed  and  compared  in  the 
work  of  Yang  et  al.  [1]  as  well  as  Abutaleb ’s  2-D  histogram- 
based  approaches  were  discussed  in  the  work  of  Yang  et  al. 
[14],  there  is  no  need  to  repeat  their  work  here.  Instead,  this 
paper  is  primarily  focused  on  a  comparative  study  and 
analysis  among  Kittler  and  Illingworth’s  MET,  the  three 
co-occurrence  matrix-based  entropy  thresholding  tech¬ 
niques  and  three  relative  entropy  thresholding  methods 
plus  Otsu’s  [28]  method.  The  reason  of  including  Otsu’s 
method  in  our  study  is  because  this  method  has  been 
widely  used  and  proved  to  be  one  of  the  most  successful 
techniques  in  image  thresholding.  It  should  be  noted  that 
Pun  and  Kapur  et  al.9 s  methods  and  Li  and  Lee’s  MCE 
are  not  included  in  our  study.  The  former  was  shown  not 
comparable  with  Pal  and  Pal’s  method  and  the  latter  per¬ 
formed  very  poorly  in  most  of  our  experiments.  In  addition, 
two  objective  measures,  uniformity  and  shape,  suggested  in 
Sahoo  et  al.  [1]  are  introduced  to  evaluate  their  comparative 
performance. 

2  Entropy  thresholding 

The  concept  of  entropy  has  been  widely  used  in  data  com¬ 
pression  to  measure  information  content  of  an  information 
source.  Suppose  that  a  source  X  has  L  source  alphabets 
denoted  by  {xjf=1  and  the  probability  of  the  it h  source 
alphabet  xt  is  given  by  pt.  In  this  case,  a  source  can  be  speci¬ 
fied  by  a  probability  vector p  =  (pi,  , pL),  where pt  is  the 
probability  of  xt.  For  each  source  symbol  xt  for  1  <  i  <  L, 
we  can  define  the  so-called  self-information  of  xt  as 
I(xj)  =  —  \og(pt)  [29,  30].  Such  self-information  7(x*) 
describes  how  much  information  or  uncertainty  produced 
by  a  particular  source  alphabet  xt.  Furthermore,  because 
the  significance  of  each  source  alphabet  is  also  determined 
by  its  occurrence  generated  by  the  source  X ,  the  probability 
of  each  source  alphabet  must  be  factored  in  the  description 
of  the  information  for  X.  As  a  consequence,  an  effective 
means  to  describe  the  information  for  the  source  X  is  the 
mean  of  self-information  over  the  L  source  alphabets 
{xjf=  i,  which  turns  out  to  be  Ex[I(X)\.  However,  if  we 
expand  the  expression  of  Ex[I(X)]  as  follows,  Ex[I(X)\ 
becomes  the  well-known  entropy. 

L 

H{X)  =  Ex[I(Xj\  =  Ex[-  log(X)]  =  £>(x,)/(x,) 

i=  1 

L  L 

=  1o8(a)]  =  -  EpJ  log  pJ  (!) 

i=  1  7=1 
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As  an  image  can  be  viewed  as  an  information  source  with  a 
probability  vector  described  by  its  grey-level  image  histo¬ 
gram,  the  entropy  of  the  histogram  can  be  used  to  represent 
a  certain  level  of  information  contained  in  the  image.  Pun 
[7,  8]  and  Kapur  et  al  [9]  had  taken  this  concept  to 
derive  entropy  thresholding  methods,  that  will  be  referred 
to  as  ME  methods.  However,  their  approaches  did  not 
take  into  account  the  correlation  among  grey  levels.  As  a 
result,  two  different  images  with  an  identical  image  histo¬ 
gram  will  result  in  the  same  threshold  value.  One  way  to 
resolve  this  problem  is  to  consider  the  grey-level  co¬ 
occurrence  matrix  defined  in  the  following  section,  which 
contains  the  information  of  grey-level  transitions  in  an 
image.  Two  approaches  have  been  investigated  in  the 
past.  One  is  Abutaleb’s  2-D  histogram,  that  takes  advantage 
of  the  correlation  between  a  grey  level  value  and  its  local 
average  to  capture  image  spatial  correlation.  Another  is 
the  co-occurrence  matrix  that  records  transitions  between 
every  pair  of  grey  levels  in  an  image  histogram.  As 
Abutaleb’s  2-D  histogram-based  approaches  require  two 
separate  threshold  values  that  are  not  in  our  scope,  they 
will  not  be  discussed  here.  Instead,  we  will  primarily 
focus  in  this  paper  on  single  threshold  value-based 
approaches. 

2. 1  Grey-level  co-occurrence  matrix 

Assume  that  an  image  has  a  size  of  M  x  N  with  L  grey 
levels  denoted  by  G  =  {0,  1,  . . . ,  L  -  1}.  Let  /(x,  y )  be 
the  grey  level  of  the  pixel  at  the  spatial  location  (x,  y). 
Then  the  image  can  be  represented  by  an  M  x  N  matrix 
F  =  [/(x,  y)]uxN-  A  1-D  image  histogram  resulting  from 
/(x,  y)  and  the  image  matrix  F  is  a  distribution  of  the  L 
grey  levels  G  =  {0,  1 ,  . . . ,  L  —  1 }  in  accordance  with  the 
frequency  of  their  occurrence.  Unfortunately,  such  a  1-D 
histogram  discards  the  correlation  among  grey  levels, 
which  is  crucial  in  image  thresholding  and  segmentation. 
In  order  to  resolve  this  issue,  a  2-D  histogram  that  can 
describe  and  capture  image  correlation  is  necessary  to 
improve  thresholding  performance.  One  such  approach  is 
the  use  of  co-occurrence  matrix. 

A  co-occurrence  matrix  of  an  image  is  an  L  x  L  square 
matrix,  denoted  by  W  =  [^]Lxi  whose  elements  are  speci¬ 
fied  by  the  numbers  of  transitions  between  all  pairs  of 
grey  levels  in  G  =  {0,  1 ,  . . . ,  L  —  1 }  in  a  particular  way. 
For  each  image  pixel  at  spatial  co-ordinate  ( m ,  ri)  with  its 
grey  level  specified  by  f(m ,  n ),  it  considers  its  nearest  four 
neighbouring  pixels  at  locations  of  (m  —  1,  n),  (m  +  1,  n), 
(m,  n  —  1),  ( m ,  n  +  1)  and  referred  to  as  the  4-adjacency 
in  the  work  of  Gonzalez  and  Woods  [17].  The 
co-occurrence  matrix  developed  by  Haralick  et  al.  [18]  is 
designed  to  dictate  the  grey  level  changes  by  comparing 
its  grey  level  /  (m,  n)  to  their  corresponding  grey  levels, 
f  (m  —  1,  n),f  ( m  +  1,  n),f  ( m ,  n  —  1),/  ( m ,  n  +  1).  It  has 
been  shown  that  there  is  no  significant  difference  between 
considering  all  the  four  neighbouring  pixels  and  using 
only  two  neighbouring  pixels  at  the  horizontal  and  vertical 
directions  in  the  4-adjacency  of  a  pixel.  One  widely  used 
co-occurrence  matrix  is  an  asymmetric  matrix  that  only 
considers  the  grey  level  transitions  between  two  adjacent 
pixels.  More  specifically,  let  ty  be  the  (/,  y)th  element  of 
the  co-occurrence  matrix  W.  Following  the  definition 
given  in  the  work  of  Chang  et  al.  [24] 

M  N 

%  =  EES-  (2) 

m=  1  n= 1 


and 


Smn  =  1 


f(m,  n )  =  i  and  =j 


if 


and/or 


f(m,  n )  =  i  and /(m,  n  +  1)  =j 


=  0;  otherwise 


where  ‘and/or’  used  in  the  8mn  defined  earlier  implies 
‘either  or  both’. 

Normalising  the  total  number  of  transitions  in  the 
co-occurrence  matrix,  a  desired  transition  probability  from 
grey  level  i  to  grey  level  j  is  obtained  by 

Pij  =  WET  tLl-1  7  (3) 

2^k= o  2^i= o  lM 

For  more  details  on  co-occurrence  matrix,  we  refer  to 
previous  studies  [3,  17,  18]. 


2.2  Quadrants  of  the  co-occurrence  matrix 


Let  t  be  a  value  used  to  threshold  an  image.  It  partitions  a 
co-occurrence  matrix  into  four  quadrants,  namely,  A,  B,  C 
and  D ,  shown  in  Fig.  1.  These  four  quadrants  can  be 
further  grouped  into  two  classes,  referred  to  as  local  quad¬ 
rants  and  joint  quadrants.  We  assume  that  pixels  with 
grey  levels  above  the  threshold  are  assigned  to  the  fore¬ 
ground  (corresponding  to  objects),  and  those  equal  to  or 
below  the  threshold  are  assigned  to  the  background.  Then 
quadrants  A  and  C  correspond  to  local  transitions  within 
background  and  foreground,  respectively,  whereas  quad¬ 
rants  B  and  D  are  joint  quadrants  which  represent  joint  tran¬ 
sitions  across  boundaries  between  background  and 
foreground.  The  probabilities  associated  with  each  quadrant 
are  then  given  by 


pA  =  i2 ±p»  Pr 

i= 0  y'=0  /=()  j=t+ 1 

L- 1  t  L- 1  L- 1 

p,c=  E  ptD=  E  E p^  (4) 

i=t+ 1  7=0  i—t+ 1  j=t+ 1 


The  probabilities  of  grey-level  transition  within  each  par¬ 
ticular  quadrant  can  be  further  obtained  by  so  called  ‘cell 
probabilities’ 


t  _  Pij  t  _  Pij  t  _  Pij 

Pij  A  pt  ’  Pij\B  pt  ’  Pij\C  pt  ’ 

rA  rB  rC 


t 

Pij\D 


pi 

rD 


(5) 


2.3  LE,  JE  and  GE  methods 


Three  entropies  can  be  derived  on  the  basis  of  the  cell  prob¬ 
abilities  defined  by  (4)  and  (5),  each  of  which  yields  a 
different  measure.  The  first  two  were  proposed  by  Pal  and 
Pal  [19],  which  are  called  LE  and  JE.  The  third  one  is  a 
new  definition,  which  will  be  referred  to  as  GE. 


t 


A 

B 

(BB) 

(BF) 

D 

C 

(FB) 

(FF) 

Fig.  1  Four  quadrants  of  a  co-occurrence  matrix 
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2.3. 7  Local  entropy:  As  local  quadrants  A  and  C  contain 
local  transitions  from  background  to  background  (BB)  and 
foreground  to  foreground  (FF),  respectively,  the  local  tran¬ 
sition  entropy  of  BB,  denoted  by  HBB(t)  and  the  local  tran¬ 
sition  entropy  of  FF,  denoted  by  H¥¥(t)  can  be  defined, 
respectively. 

t  t 

^bb(0  =  -  Y,  y ^Pm  p\j\a  (6) 

i= 0  7=0 

^ff(o = -  y,  y,  p\j\c  p\j\c  o) 

i=t-\- 1  y=H-l 

where  both  HBB(t )  and  //FF(£)  are  determined  by  the 
threshold  t,  thus  they  are  function  of  t. 

By  summing  up  the  local  transition  entropies  of  fore¬ 
ground  and  background,  Pal  and  Pal  derived  so-called  LE, 
denoted  by  H¥B(t)  as  follows. 

HLE(t)=HBB(t)+HfF(t)  (8) 

Obviously,  H¥E(t)  describes  the  grey-level  transitions 
entropy  of  the  local  quadrants  A  and  C.  Thus,  it  is  more  pre¬ 
cisely  to  be  called  Tocal  transition  entropy’  to  reflect  the 
characteristic  of  quadrants  A  and  C.  The  LE  method  pro¬ 
posed  by  Pal  and  Pal  [  1 9]  finds  a  grey  level  value  specified  by 

<lb  =  arg{  «  |  (9) 

which  maximises  HEE(t)  defined  by  (8)  over  t. 

2.3.2  Joint  entropy:  Alternatively,  the  joint  quadrants  B 
and  D  provide  edge  information  about  joint  transitions  from 
background  to  foreground  (BF)  and  foreground  to  back¬ 
ground  (FB).  In  analogy  with  LE,  another  entropy,  called 
JE,  Hm(t)  was  also  derived  by  Pal  and  Pal,  which  is  the 
sum  of  the  joint  transition  entropy  H¥B(t)  resulting  from 
the  joint  quadrant  B ,  and  the  joint  transition  entropy 
HB¥(t)  from  the  joint  quadrant  D  and  is  defined  as  follows. 


L- 1  t 


HVB(t)  —  ^2  YAb  P\j\B 

(10) 

i=t+ 1  7=0 

t  L-l 

^BF(0  =  —  Yj  Yj  P*V\D  P% \D 

(11) 

i= 0  j=t+ 1 

Hm(t)  =  HB¥(t)  +  F/fb(^) 

(12) 

Similarly,  Hm(t)  is  more  accurately  to  be  called  ‘joint  tran¬ 
sition  entropy’  to  reflect  the  grey-level  transition  activities 
in  the  joint  quadrants  B  and  D.  A  method  of  finding  tJE 
that  maximises  Hm(t)  defined  by  (12)  over  t  is  called  the 
JE  method,  which  is 

03) 

2.3.3  Global  entropy:  The  GE  HGE(t )  defined  below  is 
simply  the  sum  of  the  LE  HEE{t)  and  the  JE  HJE(t),  that  is 

#ge(0  =  #le(0  +  Hm(t)  =  HBB(t)  +  H¥¥(t) 

+  ^BF  (0  +  ^FB  (0  (14) 

Finding  a  value,  tG E  that  maximises  HGE(t)  defined  by  (14) 
over  t  via  the  following  equation 

fQE  —  ar§  j  1€G=B.X  £-l)^GE*f>  |  "5> 


is  called  the  GE  threshold  method.  It  should  be  noted  that  the 
GE  defined  by  (14)  was  not  defined  by  Pal  and  Pal  [19]. 
However,  it  turns  out  to  be  a  counterpart  of  Chang  et  aV  s 
[24]  GRE.  Because  GE  is  the  sum  of  LE  and  JE,  it  can  be 
expected  that  the  performance  based  on  GE  will  be  moderate 
between  LE  and  JE.  The  experiments  seem  to  justify  our 
claim.  So,  when  it  is  uncertain  about  which  one  should  be 
chosen,  GE  may  be  a  good  candidate  for  a  compromise. 

3  Relative  entropy  thresholding 

Relative  entropy  has  been  used  to  measure  the  information 
distance  between  two  information  sources.  The  smaller  the 
relative  entropy  is,  the  closer  the  two  sources  are  in  terms  of 
their  probability  distributions.  As  described  in  the  beginning 
of  Section  2,  a  source  can  be  specified  by  a  probability 
vector.  Now,  assume  that  there  are  two  sources,  X  and 
T,  each  of  which  has  L  source  alphabets.  Let  X  and  Y  be 
specified  by  the  probability  vectors  p  =  (p i,  . . .  ,Pl ,)  and 
h  =  (h\,  . . .  ,  hL ),  respectively.  The  relative  entropy 
between  two  sources  X  and  Y  via  their  respective  probability 
vectors  p  and  h  (or  the  entropy  of  h  relative  to  p),  denoted 
by  J(p ;  h)  is  defined  by 

L~l  v 

J(p;  h)  =  Pj  log-r  (16) 

7=0  Hj 

The  definition  given  by  (16)  was  first  introduced  by 
Kullback  [29]  as  an  information  distance  measure 
between  two  probability  distributions.  It  is  called 
Kullback-Leiber’s  information  discriminant  measure,  and 
is  also  known  as  cross  entropy  and  directed  divergence.  It 
implies  that  the  smaller  the  relative  entropy,  the  less  the  dis¬ 
crepancy  between  p  and  h,  thus,  the  better  the  match 
between  the  two  probability  vectors.  Relative  entropy  can 
be  used  to  measure  the  distance  between  an  image  and  a 
thresholded  image.  It  is  worth  noting  that  the  relative 
entropy  is  not  symmetric,  that  is  J(p ;  h)  ^  J(h\  p).  In  this 
paper,  the  original  image  is  always  designated  as  the 
nominal  image  p ,  and  the  thresholded  image  is  h,  the  one 
which  tries  to  match  the  original  image. 

3. 1  Kittler  and  Illingworth's  MET 

The  concept  of  using  relative  entropy  as  a  thresholding  cri¬ 
terion  was  first  suggested  by  Kittler  and  Illingworth  [20],  in 
which  they  assumed  that  an  image  could  be  modelled  by  a 
mixture  of  two  Gaussian  distributions,  which  can  be  used  to 
describe  background  and  foreground,  respectively.  More 
specifically,  let  ptme  =  (/?0|tme>  Untrue, . . .  ? Pl— i |tme)  be  an 
image  histogram.  Assume  that  t  is  a  threshold  value  used 
to  segment  the  image  into  background  and  foreground, 
both  of  which  are  also  modelled  by  Gaussian  distributions, 
pB(t)  and  p¥(t ),  respectively.  Define  pmix(t)  as  a  mixture  of 
these  two  Gaussian  distributions  by 

PmhSf)  =  aPB (0  +  (1  -  a)Pr(t)  (17) 

where  a  is  determined  by  the  portions  of  background  and 
foreground  in  the  image.  Kittler  and  Illingworth’s  MET 
finds  a  grey  level  value  ^MET  that  minimises  the  mismatch 
between  ptrue  and  pmix(t)  over  t ,  that  is 

'mix  =  arg{  ,eG=(“m.,7_|)  •/(/,true;/>mix('))  j  (18) 

where  J(p ;  pm\x(t))  is  the  relative  entropy  between p  and ptn ie 
defined  by  (16)  to  measure  the  discrepancy  between  the  two 
probability  vectors,/;  and  ptnxQ.  As  expected,  if  the  background 
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and  foreground  are  well  separated  in  terms  of  grey  levels, 
Kittler  and  Illingworth’s  MET  may  work  well. 
Unfortunately,  this  assumption  is  generally  not  true  in  many 
practical  applications.  Pal  and  Pal  [22]  also  proposed  a 
Poisson  model  approach  to  improve  the  Gaussian  model  in 
MET. 


where  PA,  PB,  Pc  and  P fD  were  defined  by  (4).  For  each 
selected  t,  hy\A,  hy\B,  hy\C  and  hy\D  are  constants  in  each 
individual  quadrant  and  they  only  depend  upon  which  quad¬ 
rants  they  belong  to.  Therefore  they  can  be  simplified  by  qA, 
qB,  q*c  and  q*D,  respectively,  which  represent  conditional 
probabilities  of  each  of  four  quadrants  produced  by  hfy. 


3.2  Grey-level  co-occurrence  matrix  used  for 
relative  entropic  thresholding 

As  noticed  in  Kittler  and  Illingworth’s  MET,  their  method  is 
based  solely  on  the  grey  level  histogram  of  an  image  which 
has  not  taken  into  consideration  the  correlation  among  grey 
levels.  This  leads  to  an  idea  of  using  co-occurrence  matrix 
to  extend  the  MET,  called  second-order  relative  entropy 
as  opposed  to  the  MET  referred  to  as  first-order  relative 
entropy.  In  this  case,  the  p  =  (p i,  . . .  ,Pl)  and 
h  =  (h\,  . . .  ,  hL)  defined  in  (16)  are  replaced,  respectively, 
by  the  grey-level  transition  probabilities  {pyj^ojLo 
generated  by  the  co-occurrence  matrix  of  the  original 
image  and  the  grey-level  transition  probabilities, 
generated  by  the  co-occurrence  matrix  of  a 
thresholded  image.  The  transition  probabilities  defined  by 
the  co-occurrence  matrix  contain  the  spatial  information 
that  reflects  homogeneity  of  local  grey-level  transitions  in 
quadrants  A  and  C,  and  joint  grey-level  transitions  across 
boundaries  in  joint  quadrants  B  and  D. 

Let  the  second-order  relative  entropy  of  the  grey-level 
transition  probabilities  {py}f=oj= =ol  and  {hy}f={ oj=o  be 
defined  by 


[Kj])  =  J2Hpv  1o§  fr 

i=0  y=0  ij 


(19) 


where  py  are  the  transition  probabilities  from  grey  level  i  to 
grey  level  j  of  the  original  image  and  hy  is  the  transition 
probability  generated  by  the  thresholded  binary  image  in 
response  to  py.  Despite  the  fact  that  the  thresholded 
binary  image  has  only  grey  level  values  of  0  (background) 
and  1  (foreground),  it  should  be  noted  that  the  subscript 
of  hy ,  ij  corresponds  to  the  same  ij  used  as  the  subscript 
of  py.  Using  (19)  as  a  thresholding  criterion  to  minimise 
JiiPijYAhy})  over  t  generally  renders  a  thresholded  binary 
image  that  best  matches  the  original  image. 

Suppose  that  a  threshold  value  t  is  selected  for  binarisa- 
tion.  By  assigning  1  to  ah  grey  levels  above  t, 
G\  =  [t  +  1 ,  . . .  ,  L  —  1 }  and  0  to  all  grey  levels  equal  to 
or  below  t,  G0  =  {0,  . . .  ,  t),  we  obtain  a  binary  image. 
Further  assume  that  the  grey  levels  in  G0  and  G\  are  uni¬ 
formly  distributed  in  their  respective  regions.  The  resulting 
hy  for  each  quadrant  can  be  found  by 


pt 

rA 


h‘M  qA  (t  +  1)(r  + 1) 

r  t  t 

"ij\ B  ~  Vb  — 


pt 

lB 


(t  +  \){L  —  t  —  \) 


for  i,j  E  G0  (20) 

for  i  E  Gq  and  j  E  G^ 


Kj\C  —  tfc  — 


(L  —  t  —  \){L  —  t  —  1) 


(21) 

for  i  E  Gj  and  j  E  G^ 


(22) 


=  &  =  iL_t^°m+l)  for  '  e  G1  and-/  e  Go 


3.3  Three  relative  entropy-based  methods 

Expanding  (19)  yields 


j{{Pij}\  [h'.j]) y  p,j  log 

i=0  7=0  nij 

=  -  Jjp,  log  h\}  (24) 

i-j 

where  H({py})  is  the  entropy  of  the  probability  vector 
specified  by  {py}f=Q  fSQl  and  is  independent  of  t .  As  the 
relative  entropy,  Jijpyj'Ahy))  in  (29)  measures  the  discre¬ 
pancy  between  two  probability  vectors  specified  by 
{Pij\i=oj=o  and  {hy}f=df=0\  which  describe  the  original 
image,  and  thresholded  image,  respectively.  So,  the 
smaller  the  E({/?zy};{/jf}),  the  better  the  approximation  of 
{py)^=oj=o  to  {hy}f=cf=o  .  Therefore  the  best  threshold 
will  be  the  one  that  yields  the  smallest  value  of 
JdPyYAhy}).  However,  minimising  Jdpyj'Ahy})  in  the 
left-hand-side  of  (24)  is  equivalent  to  maximising  the 
second  term  of  the  right-hand-side  of  (24),  Ptj  l°g  hy 
which  can  be  further  reduced  to 

P'a  log  4a  +  ptB  log  4b  +  P‘c  log  4c  +  ptD  log  4d  (25) 

So,  in  order  to  minimise  (24)  over  t ,  we  only  have  to  maxi¬ 
mise  (25)  over  t.  In  analogy  with  Section  2.3,  three  different 
relative  entropies  can  be  defined  via  (25). 

3.3. 1  GRE  thresholding:  Equation  (25)  is  identical  to  the 
one  proposed  by  Chang  et  al.  [24]  and  is  referred  to  as  GRE, 
HGRE(t)  here  and  is  expressed  as  follows 

^gre(0  =  —  (Pa  l°g  Va  +  Pb  l°g  ? b  +  l°g  Ec 

+  P‘D  log  4d)  (26) 

It  describes  the  global  feature  of  grey-level  transitions  in  the 
image.  So,  the  GE  defined  by  (14)  in  entropy  thresholding 
can  be  viewed  as  its  counterpart.  Finding  a  threshold 
value  EjRe  that  minimises  (26)  is  called  GRE  thresholding 
method,  that  is 

(27) 

3.3.2  LRE  thresholding:  Analogous  to  Pal  and  Pal’s  LE, 
we  can  also  define  its  counterpart  in  relative  entropy,  called 
LRE  via  (26).  It  was  originally  proposed  by  Lee  et  al.  [27] 
in  which,  f PA ,  P c}  did  not  constitute  a  probability  distri¬ 
bution.  In  order  to  make  it  a  probability  distribution,  extra 
care  must  be  taken  by  normalising  the  probabilities  in  the 
local  quadrants  A  and  C.  If  we  define  Py\Ac  =  Py/ 
(PA  +  Pc),  then  the  correct  version  of  LRE  is  given  by 

J\4Re(\P ij\Ac} ’  hy)  =  ^2  Pij\AC  l°g  —jj 

(/,7)GBBUFF  nij 

=  — 7/bb+ff(^)  —  Pij\AC  l°g  Kj 

0',7)GBBUFF 


^gre  —  arg|  minE/GRE(0 

I  tEiU 


(23) 


(28) 
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where 


where 


^BB+Ff(0  —  ^2  Pij\AC  1 °&Pij\AC  (^9) 

OJ)GBBUFF 


^BF+Fb(0  —  —  ^2  Pij\BD  l°g  Pij\BD  (35) 

(z'j')GBFUFB 


is  the  entropy  of  local  quadrants  A  and  C  in  the 
co-occurrence  matrix  W.  The  second  term  in  (28)  can  be 
further  reduced  to 


Y  Pmc  lo8  h\j 

(z'j')GBBUFF 


=  Y  P‘MCl °g(pF 

(ij)eBB  v  A 


+  P‘r 


+  Y  PV\AC  l0s(™  ^  pi  ) 

(i.iXEFF  ^C/ 


(/j)GFF 

P\ 


Pa 


+  P*c 


log 


Ka 


Pa 


+  K 


+ 


Pa 


log 


(30) 


is  the  entropy  of  the  joint  quadrants  B  and  D  in  the 
co-occurrence  matrix  W.  So,  finding  a  threshold  tJRE  that 
minimises  JjRE({Pij\BD}’,  h \j )  is  called  JRE  thresholding 
method,  that  is 

^JRE  =  arg  |  *^JRE  ( {  Pij\BD  )  ’  Kj)  j  @6) 

One  comment  is  noteworthy.  It  should  be  noted  that  the 
LRE  and  JRE  originally  defined  by  Lee  et  al  [27]  are  not 
conditional  probability  distributions  as  they  are  not  normal¬ 
ised  by  probabilities  of  the  two  quadrants  that  constitute 
LRE  and  JRE.  Because  of  this  reason,  technically,  they 
cannot  be  called  relative  entropy,  even  these  two  seemed 
to  work  well  as  threshold  criteria  [27]. 

4  Histogram  compression  and  translation 


Substituting  (30)  into  (28)  results  in 


^FRE({Pij\AC^  Kj)  —  ~ ^bb+ff(0  ' 


L Pa+Pc 


log 


<lA 


+  - 


srlo§ 


F 


F+F 


F+Pc 

(31) 


It  should  be  noted  that  /7Bb+ff(0  given  by  (29)  is  different 
from  //LE(0  given  by  (8),  in  the  sense  that  the  former  con¬ 
siders  quadrants  A  and  C  as  an  entity  and  normalises  prob¬ 
abilities  to  unity,  whereas  the  latter  considers  quadrants  A 
and  C  as  separate  individual  entities  and  normalises  their 
probabilities  in  two  different  quadrants  A  and  C  to  unity  sep¬ 
arately.  Interestingly,  the  J\jRE{{Pij\Ac)\  Kj)  in  (31)  captures 
the  local  features  of  grey-level  transitions  within  back¬ 
ground  and  foreground  that  can  be  expressed  by 
—  7/Bb+ff(0  minus  an  extra  term  given  by  (30).  The  LRE 
thresholding  method  is  to  find  a  threshold  value  £Ere  that 
minimises  Ji^({pij[Ac\lKj),  that  is 


^LRE  —  arg 


min  «/lre({A>vc}5  Kj) 


(32) 


3.3.3  JRE  thresholding:  The  JE  has  also  its  counterpart, 
JRE  in  relative  entropy,  which  measures  the  information  of 
joint  features  of  grey-level  transitions  from  background  to 
foreground  and  foreground  to  background.  Like  the  LRE, 
the  JRE  defined  by  Lee  et  al  [27]  was  not  correct  in 
terms  of  probability  distribution.  Analogous  to  (32),  a  nor¬ 
malisation  factor  Pij\BD  =  Pij/ (Pb  +  Pd)  must  be  included 
to  normalise  the  probabilities  in  the  joint  quadrants  B  and 
D.  The  correct  JRE  is  given  by 

•^JReGA/W)};  Kj)  =  ^2  Pij\BD 

0V)GBFUFB  nij 

=  “ ^BF+Fb(0  —  ^2  Pti\BD 

(i,j)G  BFUFB 


(33) 


Kj)  ~ 


-ft 


BF+FB 


(0- 


pt 

rB 


log 


Kb 


+ 


log 


Kd 


P^+Pl 


(34) 


It  was  reported  by  Ramac  and  Varshney  [26]  that  Chang 
et  aV  s  relative  entropy  method,  GRE  did  not  perform 
well  for  some  images.  This  was  mainly  due  to  fact  that 
their  image  histograms  are  distributed  sparsely  with  large 
gaps  between  two  consecutive  grey  levels.  Unlike  entropy- 
based  methods,  relative  entropy-based  methods  are  gener¬ 
ally  sensitive  to  such  sparse  image  histograms.  In  this 
case,  in  order  for  relative  entropy-based  methods  to  work 
effectively,  a  sparse  image  histogram  must  be  compressed 
to  a  more  compact  histogram.  This  idea  is  called  histogram 
compression  and  translation  (HCT),  which  is  very  similar  to 
the  commonly  used  histogram  equalisation.  However, 
instead  of  stretching  a  1-D  image  histogram  to  cover  the 
entire  grey-level  range  as  the  histogram  equalisation  does, 
the  HCT,  does  inversely  by  compressing  the  2-D  histogram 
due  to  relationship  of  one  grey  level  relative  to  another.  This 
is  a  major  difference  between  the  histogram  equalisation 
and  the  HCT,  because  the  former  deals  with  a  1-D  image 
histogram,  whereas  the  latter  has  to  take  into  account  the 
relative  spatial  relationship  characterised  by  a  2-D  histo¬ 
gram  resulting  from  a  co-occurrence  matrix,  in  which  case 
the  image  histogram  must  be  compressed  rather  than 
being  stretched.  In  what  follows,  we  develop  a  method  for 
this  purpose. 

Suppose  that  the  total  number  of  distinct  grey  levels  in  an 
image  is  N.  Without  loss  of  generality,  we  assume  that  gi? 
gi,  •  •  •  ,gN  are  these  N  distinct  grey  levels  that  can  be 
arranged  in  accordance  with  gi  <  g2  <  •  •  •  <  gN,  where 
gi  =  gmin  is  the  smallest  grey  level  and  gN  =  gmax  is  the 
largest  grey  level.  Let  n(gk)  be  the  total  number  of  pixels 
in  the  image  whose  grey  level  is  gk.  Two  parameters  will 
be  used  to  measure  the  sparseness  of  a  1-D  image  histo¬ 
gram.  One  parameter  is  the  N.  Another  is  the  width  of  a  his¬ 
togram  defined  by  w  =  gN  —  gi .  In  general,  w  >  N.  If  a  1-D 
image  histogram  whose  width  w  is  very  close  to  N,  then  its 
histogram  will  be  dense  and  distributed  compactly.  On  the 
contrast,  if  w  is  much  greater  than  N ,  the  histogram  will 
be  distributed  sparsely.  In  this  case,  a  histogram  com¬ 
pression  and  translation  is  generally  needed  for  relative 
entropy-based  thresholding  methods.  The  process  is 
referred  to  as  HCT,  defined  by  mapping  gk  k  with 

HCT(g^)  =  k  and  nk  =  n{gk)  for  1  <  k  <  N  (37) 

Using  (37),  a  new  HCT-compressed  and  translated  1-D 
image  histogram  can  be  created  for  the  original  image, 
which  is  a  plot  of  nk  against  k  with  1  <  k  <  N. 
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5  Performance  measures 

In  order  to  avoid  human  interpretation,  two  objective 
measures,  uniformity  and  shape  [1,  2],  will  be  used  for 
performance  evaluation. 


5. 1  Uniformity  measure 

The  uniformity  measure  is  generally  used  to  describe  region 
homogeneity  in  an  image.  For  a  given  threshold  t ,  it  is 
defined  by 


U(t)  =  1 


oj(0  +  Of  (0 

c 


(38) 


where  B  and  F  represent  background  and  foreground 
regions,  /(x,  y)  is  the  grey  level  of  the  pixel  (x,  y) 

„  \2  _  E(x,v)es  f(*>y) 

^  (^max  £min)  1  Mb  t  > 

Z  Tin 


,  _E(x.y)GF  /(*’>’) 

4 

=  4"  XI  (/V,  v)  -  Mb)2, 

Ub  (x,y)<EB 

a>(0  =  4"  X  (/(x’3;)  -  Mf)2, 

nF  (x,y)<EF 


t^b  is  the  number  of  pixels  in  background  region  and  nF  is 
the  number  of  pixels  in  foreground  region. 

5.2  Shape  measure 

The  shape  measure  is  generally  used  to  measure  geometric 
features  of  objects  present  in  an  image.  It  is  calculated  as 
follows. 

(a)  We  first  define  a  generalised  gradient  function  A(x,  y) 
by 


A  (x,y) 


j2D2k  +  j2Dt(D3+D4) 

k=  1 

-  V2 D2(D3  -  D4) 


(39) 


where  Dx  =/(x  +  1,  y)  -f(x  ~  1,  y),  D2  =  /(x,  y-  1)  - 
/(x,  y  +  1),  D3  =/(x  +  1,  y  +  1)  -/(x-  1,  y-1)  and 
X>4  =  /(x  +1,  y  —  1)  —  f{x  —  1,  y  +  1),  and  assign  its 
value  to  every  pixel  (x,  y).  It  should  be  noted  that  the 
gradient  Dx  dictates  the  grey-level  changes  along  x-axis 
(i.e.  0°-180°  horizontal  line),  whereas  the  gradient  D2  dic¬ 
tates  the  grey-level  changes  along  the  y-axis  (i.e.  90° -270° 
vertical  line).  Additionally,  the  gradient  D3  dictates  the 
grey-level  changes  diagonally  (i.e.  45° -225°  diagonal 
line)  compared  with  the  gradient  D4  that  dictates  the  grey- 
level  changes  anti-diagonally  (i.e.,  135°-315°  second  diag¬ 
onal  line).  Basically,  these  four  gradients  cover  all  the  eight 
orientations,  0°,  45°,  90°,  135°,  180°,  225°,  270°,  315°, 
which  can  be  used  to  capture  image  shape  features. 

(b)  Second,  if  the  pixel  (x,  y)  has  a  grey  value  higher  than 
the  average  of  its  neighbours,  then  assign  *+’  sign  to  A(x,y) 
and  assign  4  —  ’  sign,  otherwise. 
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(c)  Third,  compute  the  shape  measure  using  the  following 
formula 


— 


E(x.y)£F  sign (f(x,y)  -fB)A(x,y)sign(f(x,y)  -  t ) 
CF 


(40) 


where 


sign(x)  = 


+1;  if  x  >  0 
—  1;  if  x  <  0 


is  the  sign  function,  CF  is  a  normalisation  constant  given  by 


CF  =  max 


X  S'gn(./V,  v)  -fB)S(x,y) 

(x,y)<EF 


fB  ~ 


x  sign (/(x,y)  —  £)}  and 

1 


8 


x+1  r+1 

X  T'fM-f&y) 

i=x—  1  j—y—  1 


6  Experiments 

In  this  section,  seven  entropy-based  thresholding  methods 
(LE,  JE,  GE,  Kittler  and  Illingworth’s  MET,  LRE,  JRE, 
GRE)  will  be  implemented  and  compared  via  a  series  of 
experiments.  They  can  be  categorised  into  three  groups. 
The  first  group  contains  one  first-order  thresholding 
method,  Kittler  and  Illingworth’s  MET,  which  relies  on 
1-D  image  histograms  without  taking  into  account  inter¬ 
pixel  spatial  correlation.  The  second  and  third  groups  are 
made  up  of  second-order  thresholding  methods,  which 
utilise  a  co-occurrence  matrix  to  account  for  spatial  corre¬ 
lation  among  pixels.  The  second  group  comprises  of  three 
entropy  thresholding  methods,  LE,  JE  and  GE  and  the 
third  group  consists  of  three  relative  entropy  thresholding 
methods,  LRE,  JRE  and  GRE,  which  are  considered  to  be 
counterparts  of  the  methods  in  the  second  group. 

In  order  to  make  our  comparative  study  more  complete, 
we  also  include  Otsu’s  [28]  method  as  a  benchmark  com¬ 
parison.  Otsu’s  method  is  a  widely  used  thresholding 
method  and  has  been  shown  to  perform  well  in  general.  It 
is  based  on  a  criterion  that  maximises  ratio  of  between-class 
variance  to  within-class  variance  and  can  be  described 
briefly  as  follows. 

6. 1  Otsu's  method 

Otsu’s  method  is  a  special  case  of  two-class  Fisher’s  linear 
discriminant  analysis  (LDA)  in  pattern  classification  [31], 
where  the  optimal  criterion  is  the  ratio  of  between-class 
variance  to  within-class  variance.  Let  the  1-D  histogram 
of  an  image  be  described  by  a  probability  vector,  ( p0 , 
/?!,...  ,pL~ i),  where  pt  =  fii/n ,  nt  is  the  number  pixels 
with  grey-level  value  i  and  n  is  the  number  of  image 
pixels.  Suppose  that  t  is  a  selected  threshold  value.  Then 
probabilities  of  background  and  foreground  of  the 
Mhresholded  binary  image  can  be  defined  by 

t  L- 1 

4  =  X  Pi  and  PF  =  l-PB=J2  Pi  (4I  > 

i= 0  i=t+ 1 

Using  (41),  the  means  and  variances  associated  with  back¬ 
ground  and  foreground  can  be  further  defined,  respectively, 
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as  follows 


From  (43),  we  obtain 


1  >  1 
=  pT  X  iPi  and  4  =  ip>  (42) 

rB  /— 0  rF  i=t+ 1 

1  x^  a 

var5  =  (z  “  and 

i= o 

1  Z_1 

var^  =  FrE(/”  /402A  (43) 

rF  i=t+ 1 

It  further  considers  the  between-class  variance  and  within- 
class  variance  defined  similarly  in  Fisher’s  LDA  by 

varbetween-class  =  ~  ^  +  ^F’iP'F  ~  A1)2 

=  —  pF)2  (44) 

where  p  =  J/fXo1  iPi  is  the  global  mean  of  the  image  and 

varwithin-class  =  PB^'b  +  P‘fVMF  (45) 

It  then  finds  a  threshold  value,  fotsu  that  maximises 
varbetween-ciass*  or  equivalently  minimises  va4ithin_ciass, 
that  is 


^Otsu  ars  1  ^^{varbetween-class} 


argj  minKilKkJ 


(46) 


Conceptually,  Otsu’s  idea  is  very  similar  to  Pal  and  Pal’s  JE 
method.  If  we  interpret  the  local  quadrants  in  Fig.  1  as  within- 
class  quadrants  and  the  joint  quadrants  in  Fig.  1  as  between- 
class  quadrants,  Otsu’s  method  is  essentially  similar  to  the  JE 
and  JRE  methods,  despite  the  fact  that  they  are  technically 
different  methods.  Otsu’s  method  is  a  first-order  method, 
which  uses  the  1-D  image  histogram  to  form  within-class 
and  between-class  variances  and  maximises  the  between-class 
variance,  whereas  JE  and  JRE  are  second-order  methods, 
which  maximise  the  entropy  and  relative  entropy  of  joint 
quadrants  of  a  co-occurrence  matrix,  respectively. 
Additionally,  the  measure  used  in  Otsu’s  method  is  variance 
compared  with  the  self-information  (i.e.  —  log/?*)  in  the 
joint  quadrant  used  in  the  JE  and  the  discrepancy  of  self¬ 
information  between  two  joint  quadrants  (i.e.  log (py\BD/ 
h\j)  =  —  log  h\j  —  log  Pij\BD )  used  in  JRE  methods. 

More  interestingly,  crj(^),  crF(t)  in  (38)  can  be 
re-expressed  as 

oj(0  =  X!  -  /4»2  =  X  {i  ~  PnP'h 

{x,y)<FB  i= 0 

=  n  XO'  -  Ms)2  (-)  =  »  X  (i  ~  P^lpi  =  "  varB 

i=0  n  i=0 

(47) 


L- 1 


F(t)  =  X  (/V,  .V)  -  l-Lpf  =  XO- Mf)2«/ 


(x,y)GF 
L- 1 


/ — 1 


-  n  X  O'  -  ^)2© 


i — 1 


L-l 

=  n  X  (*  _  PF^Pi  =  n  VarF 

1 


(48) 


varwithin-class  =  ^VarS  +  ^FVarF  =  +  ^1(0]  (49) 

By  virtue  of  (49),  maximising  C/(0  in  (38)  is  equivalent  to 
minimising  varwithin-class?  which  is  also  equivalent  to  maxi¬ 
mising  varbetween_class  according  to  (46).  As  a  result,  the 
threshold  value  produced  by  Otsu’s  method,  tQ tsu  is  identi¬ 
cal  to  the  t  that  maximises  U(t)  in  (38).  It  should  be  noted 
that  the  values  of  U(t)  vary  with  images.  However,  the  nor¬ 
malisation  constant  C  in  U{t)  is  independent  of  the  threshold 
value  t.  In  this  case,  C  can  be  chosen  to  normalise  the  values 
of  U(t )  to  the  range  of  [0,  1]  such  that  the  minimum  and 
maximum  of  U(t )  for  each  image  were  always  set  to  0 
and  1  respectively  for  comparison.  Using  this  process,  the 
uniformity  values  calculated  from  U{t )  in  the  following 
experiments  are  always  in  between  0  and  1 . 

6.2  Experiments 

The  following  experiments  are  conducted  to  demonstrate 
the  performance  of  nine  thresholding  methods:  a 
classification-based  thresholding  method,  Otsu’s  method, 
a  first-order  entropic  thresholding  method;  Pun  and  Kapur 
et  aV  s  ME,  a  first-order  relative  entropic  thresholding 
methods;  Kittler  and  Illingworth’s  MET,  three  second-order 
entropy  thresholding  methods;  Pal  and  Pal’s  JE  and  LE,  GE; 
three  second-order  relative  entropy  thresholding  methods: 
LRE,  JRE  and  GRE  with/ without  HCT  where  the  uniform¬ 
ity  and  shape  measures  were  also  used  for  objective 
performance  criteria.  Additionally,  the  two  parameters  w 
and  N  were  also  studied  to  evaluate  the  need  of  HCT. 
Four  different  images  were  selected  for  experiments. 
Experiment  1:  Watch:  The  image  studied  in  this  experiment 
is  a  watch  shown  in  Fig.  2a.  Its  1-D  histograms  before  and 
after  HCT  are  nearly  the  same.  They  are  plotted  in  Figs.  2b 
and  c  with  w  and  N  shown  in  Fig.  2b.  The  plots  of  the 
co-occurrence  matrices  of  Figs.  2b  and  c  are  shown  in 
Figs.  2d  and  e.  The  values  of  uniformity  and  shape  were  cal¬ 
culated  and  also  plotted  in  Figs.  2/ and  g.  Figs.  3 a-l  show 
the  binary  images  resulting  from  ME,  MET,  Otsu,  JE,  LE, 
GE,  LRE  with  HCT,  JRE  with  HCT,  GRE  with  HCT, 
LRE,  JRE  and  GRE,  respectively.  As  we  can  see  from  the 
thresholded  images  in  Fig.  3,  the  best  results  were  produced 
by  the  MET  and  Otsu’s  method  in  Figs.  3 b  and  c,  which  out¬ 
performed  all  the  second-order  entropy  and  relative  entropy 
thresholding  methods.  Table  1  tabulates  the  uniformity  and 
shape  values  of  their  threshold  values  where  the  MET  and 
the  Otsu’s  method  yielded  largest  values.  As  noticed, 
most  thresholding  methods  generated  higher  uniformity 
values  than  shape  values.  This  implies  that  the  uniformity 
of  the  watch  image  had  more  influence  than  shape  does 
on  the  thresholded  images. 

Experiment  2.  House:  The  w  and  N  of  the  watch  image 
studied  in  Experiment  1  were  approximately  the  same 
where  HCT  did  not  have  impact  on  the  thresholded 
results.  This  experiment  shows  another  extreme  as 
opposed  to  Experiment  1.  The  image  is  shown  in  Fig.  4 a 
with  its  1-D  histograms  before  and  after  HCT  plotted  in 
Figs.  4b  and  c  where  w  and  N  are  also  shown  in  Fig.  4b, 
gi  =  gmin  =  78  and  gN  =  gmax  =  255  with  N  =  68  and 
w  =  238.  In  this  case,  the  width,  w,  is  much  greater  than 
N  with  w/N=  3.5.  The  original  histogram  in  Fig.  4b 
looks  very  sparse  with  grey-level  values  spread  from  78 
to  255.  In  contrast,  the  HCT-compressed  and  translated  his¬ 
togram  in  Fig.  4c  was  compacted  with  grey-level  values  in  a 
compressed  and  translated  range  from  1  to  69.  The  plots  of 
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Fig.  2  Watch  image 

a  Watch  b  1-D  histogram  c  1-D  histogram  with  HCT  d  2- 
g  Shape 

the  co-occurrence  matrices  of  Figs.  A b  and  c  are  shown  in 
Figs.  4 d  and  e.  As  we  can  see,  the  inter-pixel  spatial  corre¬ 
lation  between  grey-level  values  in  Fig.  Ae  was  much  denser 
than  that  in  Fig.  Ad.  The  values  of  uniformity  and  shape 


D  histogram  e  2-D  histogram  after  HCT  /Uniformity 


were  also  calculated  and  plotted  in  Figs.  4/  and  g. 
Figs.  5 a-l  show  the  thresholded  binary  images  resulting 
from  the  methods  of  ME,  MET,  Otsu,  JE,  LE,  GE,  LRE 
with  HCT,  JRE  with  HCT,  GRE,  with  HCT,  LRE,  JRE 


Fig.  3  Binary  thresholded  images  resulting  from  various  methods 

a  ME  (7  =  166)  b  MET  (t  =  63)  c  OTSU  ( t  =  81)  d  LE  (7  =  165)  e  JE  (7  =  107)  /GE  (7  =  165)  g  LRE  with  HCT 
(7=101)  h  JRE  with  HCT  (7  =  109)  i  GRE  with  HCT  (t  =  102)  j  LRE  (7  =  101)  k  JRE  (7  =  109)  /GRE/ =102) 


IEE  Proc.-Vis.  Image  Signal  Process.,  Vol.  153,  No.  6,  December  2006 


845 


Table  1:  Uniformity  and  shape  values  resulting  from  nine  thresholding  methods  in  this  paper 


Uniformity 

Watch 

House 

Tank 

Text 

Shape 

Watch 

House 

Tank 

Text 

ME 

0.0473 

0.9149 

0.9043 

0.4564 

0.0526 

0.9449 

0.8486 

0.7307 

MET 

0.9439 

0.9952 

0.8967 

0.9212 

0.9818 

0.6052 

0.9957 

0.5927 

Otsu 

1 

1 

1 

1 

0.8290 

0.6687 

0.9258 

0.7472 

LE 

0.0480 

0.9098 

0.9043 

0.5906 

0.0535 

0.9712 

0.8486 

0.9124 

JE 

0.8745 

0.9874 

0.6746 

0.4229 

0.4907 

0.5849 

0.8028 

0.6771 

GE 

0.0480 

0.9874 

0.8329 

0.4638 

0.0535 

0.5849 

0.8182 

0.7427 

LRE  with  HCT 

0.9007 

0.9098 

0.9977 

0.9106 

0.5539 

0.9712 

0.9331 

0.5775 

JRE  with  HCT 

0.8653 

0.9874 

0.6746 

0.4229 

0.4680 

0.5849 

0.8028 

0.6771 

GRE  with  HCT 

0.8959 

0.9874 

0.9991 

0.9866 

0.5427 

0.5849 

0.9108 

0.6831 

LRE 

0.9007 

0.9149 

0.0656 

0.9106 

0.5539 

0.9712 

0.1052 

0.5775 

JRE 

0.0185 

0.0032 

0.3414 

0.2876 

0.2666 

0.5849 

0.7277 

0.3735 

GRE 

0.8959 

0.9868 

0.0208 

0.9866 

0.5427 

0.5849 

0.0289 

0.6837 

and  GRE  respectively.  Table  1  also  tabulates  their  respect¬ 
ive  uniformity  and  shape  values.  Apparently,  the  best  thre- 
sholded  images  were  those  produced  by  the  LE,  the  LRE 
with  HCT  and  the  LRE  which  yielded  very  high  values  of 
uniformity  and  shape  measures,  where  the  shape  values 
were  higher  than  uniformity  values.  In  contrast  to  the 
watch  image  in  Fig.  2 a  where  the  uniformity  was  more 
important  than  the  shape,  this  observation  suggested  that 
the  shape  of  the  house  image  was  more  crucial  than  its  uni¬ 
formity.  This  was  also  verified  by  Otsu’s  method  where  it 
generated  the  highest  uniformity  value  1,  but  a  low  shape 
value  of  0.6687. 

Experiment  3.  Tank:  This  experiment  was  conducted  to 
show  the  need  of  HCT  for  relative  entropy-based  entropy 
thresholding  to  be  effective.  The  image  is  a  tank  parked 
on  the  grass  field  shown  in  Fig.  6a.  with  N=  138  and 
w  =  212.  In  this  case,  the  width,  w,  is  much  greater  than 
N  with  w/N  ~  1.5.  The  original  1-D  histogram  in  Fig.  6b 
was  compressed  and  translated  by  HCT  in  Fig.  6c.  The 
plots  of  the  co-occurrence  matrices  of  Figs.  6b  and  c  are 


shown  in  Figs.  6d  and  e.  As  we  can  see,  the  inter-pixel 
spatial  correlation  among  grey-level  values  in  Fig.  \e  is 
much  more  denser  than  that  in  Fig.  6d.  The  values  of  uni¬ 
formity  and  shape  were  calculated  and  plotted  in  Figs.  6/ 
and  g.  Fig.  la-l  shows  the  binary  images  resulting  from 
the  methods  of  ME,  MET,  Otsu,  JE,  LE,  GE,  LRE  with 
HCT,  JRE  with  HCT,  GRE  with  HCT,  LRE,  JRE  and 
GRE,  respectively,  with  their  respective  uniformity  and 
shape  values  tabulated  in  Table  1.  Obviously,  the  relative 
entropy  thresholding  methods  with  HCT  performed  better 
than  their  counterparts  without  HCT  as  shown  in 
Figs,  la-i  and  Figs.  7/,  and  k.  According  to  visual  inspec¬ 
tion,  the  best  results  came  from  the  Otsu  method,  LRE 
with  HCT,  and  GRE  with  HCT  which  also  produced  the 
highest  values  of  uniformity  and  shape.  Unlike 
Experiments  1  and  2,  the  uniformity  and  shape  measures 
of  the  tank  image  were  equally  important.  For  example, 
the  MET  method  produced  the  highest  shape  value, 
0.9957,  but  the  fifth  highest  uniformity  value,  0.8967.  The 
thresholded  image  shown  in  Fig.  lb  was  not  good  as 


Fig.  4  House  image 

a  House  b  1  -D  histogram  c  1-D  histogram  with  HCT  d  2-D  histogram  e  2-D  histogram  after  HCT  /Uniformity  g  Shape 
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Fig.  5  Binary  thresholded  images  resulting  from  various 
methods 

a  ME  (t  =  134)  b  MET  (t  =  156)  c  OTSU  (t  =  145) 

dLE(t=  130)  e  JE  (t  =  166)  /  GE  (t  =  166) 

g  LRE  with  HCT  (t  =130)  h  JRE  with  HCT  (t  =  166) 

i  GRE  with  HCT  (t  =  166)  j  LRE  t=  132 

k  JRE  t  =  254  /  GRE  t  =  166 


Fig.  7  Binary  thresholded  images  resulting  from  various 
methods 

a  ME  (t  =  97)  b  MET  (t  =  131)  c  OTSU  (t  =  116) 

d  LE  (t  =  96)  e  JE  (t  =  77)  /  GE  (t  =  88) 

g  LRE  with  HCT  (t  =  118)  h  JRE  with  HCT  (t  =  77) 

i  GRE  with  HCT  (t  =  1 13)  j  LRE  (r  =  113) 

k  JRE  (t  =  53)  /  GRE  (t  =  166) 


Fig.  6  Tank  image 

a  Tank  b  1  -D  histogram  c  1  -D  histogram  with  HCT  d  2-D  histogram  e  2-D  histogram  after  HCT  /Uniformity  g  Shape 
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those  in  Figs.  7c,  g  and  /,  all  of  which  produced  the  uniform¬ 
ity  values  >0.99  and  shape  values  >0.91. 

Experiment  4.  Text  video  image:  In  Experiments  1-3  we 
have  shown  that  the  uniformity  and  shape  provided  good 
objective  measures  of  thresholded  results  as  expected  in 
the  literature.  On  the  basis  of  the  results  of  the  previous 
experiments,  we  may  promptly  jump  into  a  conclusion 
that  a  good  threshold  value  should  result  in  high  uniformity 
or  shape  values.  Unfortunately,  such  a  conclusion  is  mis¬ 
leading  and  is  generally  not  true.  The  following  experiment 
offers  a  counterexample.  The  image  studied  in  this  exper¬ 
iment  was  a  video  image  shown  in  Fig.  8 a  with  its  1-D  his¬ 
tograms  without/with  HCT  and  their  corresponding 
co-occurrence  matrices  plotted  in  Figs.  8 b  and  c  and 
Figs.  8 d  and  c,  respectively.  Because  the  w  and  N  shown 
underneath  Fig.  8 b  are  the  same,  there  was  no  need  to 
perform  HCT.  However,  w  =  N  =  256  suggested  that  the 
video  image  used  up  all  grey-level  values  to  describe  the 
complicated  image  background  where  the  main  scene  was 
simple  text  shown  in  the  centre  of  the  image.  The  values 
of  uniformity  and  shape  were  calculated  and  plotted  in 
Figs.  8/  and  g.  Figs.  9a- 1  show  the  binary  thresholded 
images  resulting  from  the  methods  of  ME,  MET,  Otsu, 
JE,  LE,  GE,  LRE  with  HCT,  JRE  with  HCT,  GRE  with 
HCT,  LRE,  JRE  and  GRE,  respectively,  where  their 
respective  uniformity  and  shape  values  are  also  tabulated 
in  Table  1.  From  an  application  of  information  retrieval 
and  index,  the  best  thresholded  image  is  the  one  produced 
by  the  JRE  where  the  text  in  the  video  image  was  clearly 
extracted.  However,  if  we  compare  the  uniformity  and 
shape  values  in  Table  4,  the  JRE  yielded  the  lowest 
values  in  both  uniformity  and  shape.  This  is  because  the 
video  image  in  Fig.  8 a  has  very  complicated  image  back¬ 
ground  where  the  effectiveness  of  shape  and  uniformity 
were  substantially  impaired  by  low  resolution  and  distorted 
image  background. 

In  addition  to  the  previous  experiments,  an  extensive  set 
of  experiments  was  also  conducted  for  performance  evalu¬ 
ation  of  the  nine  methods  described  in  this  paper. 
Unfortunately,  including  all  of  these  experiments  in  this 
paper  is  not  possible.  Instead,  we  have  chosen  to  include 
only  four  representatives  of  these  experiments  in  this 
paper  for  illustration.  Table  2  summarises  these  experiments 


where  a  ‘yes’  of  HCT  implies  that  the  thresholded  image 
can  be  improved  by  relative  entropy  methods;  a  ‘yes’  of  uni¬ 
formity  means  that  uniformity  plays  a  more  crucial  role  in 
thresholding  than  does  shape,  and  similarly  for  a  ‘yes’  of 
shape.  Nevertheless,  several  observations  resulting  from 
our  experiments  are  noteworthy  and  can  be  briefly  described 
as  follows. 


1.  No  single  thresholding  technique  could  claim  the  best 
method  among  all  the  experiments.  However,  second-order 
entropy  thresholding  methods  generally  performed  better 
than  first-order  entropy  thresholding  methods.  This  is  also 
true  for  relative  entropy  thresholding  methods. 

2.  Interestingly,  Otsu’s  method  generally  performed 
reasonably  well  in  most  of  our  experiments  due  to  its 
classification-based  thresholding  criterion,  which  results  in 
the  highest  uniformity  value  of  1 .  Nonetheless,  in  our  con¬ 
ducted  experiments,  there  always  existed  at  least  one  or 
more  from  entropy  and  relative  entropy  methods  that 
could  perform  comparably  or  better  than  Otsu’s  method. 
This  suggested  that  entropy-based  thresholding  methods 
are  generally  a  better  approach  than  traditional  thresholding 
methods. 

3.  A  good  threshold  value  generally  produced  high  uni¬ 
formity  and  shape  values. 

4.  In  most  of  our  experiments,  relative  entropy  thresholding 
methods  with  HCT  performed  better  than  their  counterparts 
without  HCT.  However,  on  some  occasions,  relative 
entropy  thresholding  methods  without  HCT  could  perform 
better  than  their  counterparts  with  HCT.  More  experiments 
for  such  comparison  can  be  found  in  the  work  of  Wang  et  al. 
[32]. 

5.  Owing  to  the  complicated  image  background  shown 
in  Experiment  4,  first-order  thresholding  methods  generally 
performed  poorly  compared  with  second-order  thresholding 
methods,  because  it  requires  second-order  statistical 
information  to  better  capture  background  variations. 
More  interestingly,  Experiment  4  also  demonstrated  that 
for  images  with  complicated  background  the  commonly 
used  objective  measures,  uniformity  and  shape  might  not 
be  good  criteria  to  be  used  for  performance  evaluation 
after  all. 


Fig.  8  Video  image  with  text 

a  Video  image  with  text  b  1-D  histogram  c  1-D  histogram  after  HCT  d  2-D  histogram  e  2-D  histogram  after  HCT 
/Uniformity  g  Shape 


848 


IEE  Proc.-Vis.  Image  Signal  Process.,  Vol.  153,  No.  6,  December  2006 


j 


Fig.  9  Binary  thresholded  images  resulting  from  various  methods  (text) 

a  ME  (t  =  172)  b  MET  (t  =  73)  c  OTSU  (t  =  99)  d  LE  (t  =  156)  e  JE  (t  =  177)  / GE  (t  =  171)  g  LRE  with  HCT 

(t  =  71)  h  JRE  with  HCT  (t  =  205)  i  GRE  with  HCT  (t  =  87)  j  LRE  (t  =  71)  k  JRE  (t  =  205)  /  GRE  (t  =  87) 


Table  2:  Summary  of  experiments  resulting  from  nine 
thresholding  methods  in  this  paper 


(w,  N) 

HCT 

Uniformity 

Shape 

Best 

thresholding 

methods 

Watch 

(256,256) 

Yes 

Yes 

No 

MET 

House 

(238,68) 

Yes 

No 

Yes 

LE,  LRE  w/o  HCT 

Tank 

(212,138) 

Yes 

Yes 

Yes 

LRE  with  HCT, 

GRE  with 

HCT,  Otsu 

Text 

(256,256) 

No 

No 

No 

JRE 

Table  3:  One-to-one  correspondence  between  entropic 
thresholding  and  relative  entropic  thresholding  methods 


Entropic  thresholding 

methods 

Relative  entropic 

thresholding  methods 

Pun/Kapur  et  al.'s  ME  [8,  9] 

Kittler  and  Illingworth's 

MET  [20] 

GE 

GRE  [24] 

LE  [19] 

LRE 

JE  [19] 

JRE 
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Table  4:  Relationship  among  entropic  thresholding  and  relative  entropic  thresholding 


Entropic  thresholding 

Relative  entropic  thresholding 

Criterion  (information  theoretic  measures) 

Shannon's  entropy 

Kullback-Leibler  information  measure  (also  known  as 

directed  divergence,  cross  entropy,  relative  entropy) 

First-order  methods  (histogram-based) 

Pun/Kapur  et  all s  ME 

Kittler  and  Illingworth's  MET 

Second-order  methods  (co-occurrence 

GE 

GRE 

matrix-based) 

LE 

LRE 

JE 

JRE 

7  Conclusion 

In  this  paper,  a  comprehensive  and  comparative  study  of 
entropy  thresholding  and  relative  entropy  thresholding  tech¬ 
niques  is  presented.  A  total  of  eight  different  entropy-based 
information  theoretic  methods,  ME,  MET,  LE,  JE,  GE, 
LRE,  JRE,  GRE,  along  with  Otsu’s  method  are  considered 
and  evaluated  by  two  objective  measures,  uniformity  and 
shape.  There  are  several  contributions  made  in  this  paper. 
One  major  contribution  is  to  provide  a  detailed  treatment 
on  entropy  thresholding  and  relative  entropy  thresholding 
with  their  counterparts  tabulated  in  Table  3  and  correspond¬ 
ing  relationship  summarised  in  Table  4.  Another  contri¬ 
bution  is  three  new  thresholding  methods,  an  entropy 
thresholding  method,  GE;  and  two  relative  entropy  thresh¬ 
olding  methods,  LRE  and  JRE.  A  third  contribution  is  an 
introduction  of  the  HCT  to  improve  relative  entropy  thresh¬ 
olding  methods.  A  fourth  contribution  is  to  show  that  uni¬ 
formity  and  shape  are  generally  good  thresholding 
measures  for  grey-scale  images,  but  not  necessarily  true 
for  video  images. 
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